The genome sequence of a segmented worm, Terebella lapidaria Linnaeus, 1767

We present a genome assembly from an individual Terebella lapidaria (segmented worm; Annelida; Polychaeta; Terebellida; Terebellidae). The genome sequence spans 765.20 megabases. Most of the assembly is scaffolded into 16 chromosomal pseudomolecules. The mitochondrial genome has also been assembled and is 15.97 kilobases in length.


Background
Terebella lapidaria Linnaeus, 1767 is a large-bodied polychaete of the family Terebellidae sensu stricto, originally described from the Mediterranean Sea (Gil, 2011;Lavesque et al., 2021).It has also been recorded from the Adriatic and Aegean Seas as well as the Atlantic coast of France and the southern coast of the UK (Fauvel, 1927;Gil, 2011;Lavesque et al., 2021;Linnaeus, 1767;NBN Trust Partnership, 2024), although its status in Atlantic regions is considered uncertain at this time until specimens can be compared to those from the Mediterranean (Lavesque et al., 2021).Around the UK specifically, Terebella lapidaria has been recorded from Devon and Cornwall, along the south-west coast of England and the Bristol Channel, and is considered a non-native species of interest to Northern Ireland.This species primarily inhabits shallow and intertidal waters, under rocks, in rock or shale crevices, shale gravel or muddy bottoms (Gil, 2011;Lavesque et al., 2021;Marine Biological Association, 1957).They can reach 80-160 segments in size and up to 9 centimetres in length (Fauvel, 1927).
The genus Terebella is characterised by having notochaetae on more than 25 segments with no clear definition between thorax and abdomen, and three pairs of branched branchiae.Of 37 species currently recognised within the genus, only T. lapidaria and Terebella banksyi Lavesque, Daffe, Londoño-Mesa & Hutchings, 2021 occur either around or in close proximity to the UK.Terebella banksyi is currently known only from its type locality in Arcachon Bay, France.The two species can be distinguished through the placement of the branchial pairs (on segments II-IV on T. lapidaria and discontinuous on segments II-III and segment V on T. banksyi) and the number of nephridial and genital papillae (five pairs on T. lapidaria, twelve pairs on T. banksyi) (Lavesque et al., 2021).
The genome of Terebella lapidaria was sequenced as part of the Darwin Tree of Life Project, and represents the first of its kind for this species.

Genome sequence report
The genome of an adult Terebella lapidaria (Figure 1) was sequenced using Pacific Biosciences single-molecule HiFi long reads, generating a total of 22.32 Gb (gigabases) from 2.37 million reads, providing approximately 34-fold coverage.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data, which produced 137.71Gbp from 911.98 million reads, yielding an approximate coverage of 180-fold.Specimen and sequencing information is summarised in Table 1.
Manual assembly curation corrected 32 missing joins or misjoins and 13 haplotypic duplications, reducing the assembly length by 0.8% and the scaffold number by 3.03%.The final assembly has a total length of 765.20 Mb in 576 sequence scaffolds with a scaffold N50 of 44.0 Mb (Table 2).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (97.22%) of the assembly sequence was assigned to 16 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 3).While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial genome was also assembled and can be found as a contig within the multifasta file of the genome submission.

Sample acquisition
Adult specimens of Terebella lapidaria were collected Batten Bay, Devon, UK (latitude 50.36, longitude -4.13) on 2020-11-15.The specimens were collected by Patrick Adkins, John Bishop, Nova Mieszkowska (all Marine Biological  Association) and Teresa Darbyshire and Anna Holmes (both Amgueddfa Cymru) and identified by Teresa Darbyshire.The specimens were preserved by liquid nitrogen.One specimen (specimen ID MBA-201115-002C, ToLID wtTerLapi1) was used for PacBio DNA sequencing, and another (specimen ID MBA-201115-002E, ToLID wtTerLapi3) for Hi-C and RNA sequencing.
The initial species identification was verified by an additional DNA barcoding process according to the framework developed by Twyford et al. (2024).A small sample was dissected from the specimen and stored in ethanol, while the remaining parts of the specimen were shipped on dry ice to the Wellcome Sanger Institute (WSI).The tissue was lysed, the COI marker region was amplified by PCR, and amplicons were sequenced and compared to the BOLD database, confirming the species identification (Crowley et al., 2023).Following whole genome sequence generation, the relevant DNA barcode region was also used alongside the initial barcoding data for sample tracking at the WSI (Twyford et al., 2024).The standard operating procedures for Darwin Tree of Life barcoding have been deposited on protocols.io(Beasley et al., 2023).

Nucleic acid extraction
The workflow for high molecular weight (HMW) DNA extraction at the WSI Tree of Life Core Laboratory includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.
In sample preparation, the wtTerLapi1 sample was weighed and dissected on dry ice (Jay et al., 2023).Tissue from the anterior body was homogenised using a PowerMasher II tissue disruptor (Denton et al., 2023a).
HMW DNA was extracted using the Automated MagAttract v1 protocol (Sheerin et al., 2023).DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system (Todorovic et al., 2023).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023): in brief, the method employs AMPure PB beads to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer using the Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.
RNA was extracted from posterior body tissue of wtTer-Lapi3 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol (do Amaral et al., 2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.
Protocols developed by the WSI Tree of Life laboratory are publicly available on protocols.io(Denton et al., 2023b).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences Sequel IIe (HiFi) and Illumina NovaSeq X (RNA-Seq) instruments.Hi-C data were also generated from mid-body tissue of wtTerLapi3 using the Arima-HiC v2 kit.The Hi-C sequencing was performed using paired-end sequencing with a read length of 150 bp on the Illumina NovaSeq 6000 instrument.
The mitochondrial genome was assembled using MitoHiFi (Uliano- Silva et al., 2023), which runs MitoFinder (Allio et al., 2020) and uses these annotations to select the final mitochondrial contig and to ensure the general quality of the sequence.

Assembly curation
The assembly was decontaminated using the Assembly Screen for Cobionts and Contaminants (ASCC) pipeline (article in preparation).Manual curation was primarily conducted using PretextView (Harry, 2022), with additional insights provided by JBrowse2 (Diesh et al., 2023) and HiGlass (Kerpedjiev et al., 2018).Scaffolds were visually inspected and corrected as described by Howe et al. (2021).Any identified contamination, missed joins, and mis-joins were corrected, and duplicate sequences were tagged and removed.The entire process is documented at https://gitlab.com/wtsi-grit/rapid-curation(article in preparation).
Table 4 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -Legal and Governance
The materials that have contributed to this genome Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to

Software tool Version
ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material

Background section
For the figure 1 (specimen figure) it would be good to use a bar describing length of the specimen since the authors mention about different lenght of this species (....They can reach 80-160 segments in size and up to 9 centimetres in length (Fauvel, 1927)....).

Genome sequence report
The BUSCO tool version number should be checked and be consistent with that mentioned in Table 4.

Assembly curation section
The authors should provide more details about decontamination of the assembly since they used their own work which is in preparation (article in preparation).....The assembly was decontaminated using the Assembly Screen for Cobionts and Contaminants (ASCC) pipeline (article in preparation)....
Additionally, the authors provide more details and clear explanation of the entire process.In the current form of the method it is not clear since the authors again used their unpublished work (article in preparation).Besides, the link (https://gitlab.com/wtsi-grit/rapid-curation)provided for the entire process does not clearly show the entire process.
Is the rationale for creating the dataset(s) clearly described?

Maria Nilsson
Senckenberg Biodiversity and Climate Research Centre, Frankfurt am Main, Germany The nuclear genome has been sequenced from the marine segmented worm Terebella lapidaria.The species occur in the mediterranean but is also found in French and UK waters.Two individuals were used for the genome assembly.The individuals were DNA barcoded before the nuclear genome was sequenced.
The assembly is based on pacbio longread sequences and HiC.The final length of the manual curated assembly is 765Mb and contain 16 chromosomal pseudomolecules.It was not possible to assemble the sex chromosomes.The contig N50 length is 6.7 Mb.The BUSCO score, which counts the number of complete benchmarking genes is 96.3%.
The overall QV value is 60.The statistics indicate that it is a good assembly which is generally difficult to achieve from marine soft-bodied animals.

Minor comments:
1) in the background it is stated that the genome assembly of Terebella lapidaria is "the first of its kind".Please clarify in what sense it is the first, is it the first from that family, genus etc.?
Is the rationale for creating the dataset(s) clearly described?Yes

Are the protocols appropriate and is the work technically sound? Yes
Are sufficient details of methods and materials provided to allow replication by others?Yes Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: phylogenomics, transposable elements, mitogenomics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.

Figure 2 .
Figure 2. Genome assembly of Terebella lapidaria, wtTerLapi1.1:metrics.The BlobToolKit Snailplot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 765,247,992 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (100,833,644 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (43,950,887 and 34,304,479 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the metazoa_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/wtTerLapi1_1/dataset/wtTerLapi1_1/snail.

Figure 3 .
Figure 3. Genome assembly of Terebella lapidaria, wtTerLapi1.1:BlobToolKit GC-coverage plot.Sequences are coloured by phylum.Circles are sized in proportion to sequence length.Histograms show the distribution of sequence length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/wtTerLapi1_1/dataset/wtTerLapi1_1/blob.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Figure 4 .
Figure 4. Genome assembly of Terebella lapidaria, wtTerLapi1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all sequences.Coloured lines show cumulative lengths of sequences assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/wtTerLapi1_1/dataset/wtTerLapi1_1/cumulative.

Figure 5 .
Figure 5. Genome assembly of Terebella lapidaria, wtTerLapi1.1:Hi-C contact map of the wtTerLapi1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=BaRPjPiNRWWAmO2tmlp1Kw.

sufficient details of methods and materials provided to allow replication by others? Partly Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed. YesAre

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Partly Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Partly Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
Reviewer Report 29 August 2024 https://doi.org/10.21956/wellcomeopenres.25130.r95060© 2024 Nilsson M. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.